Approximating K-means-type Clustering via Semidefinite Programming

نویسندگان

  • Jiming Peng
  • Yu Wei
چکیده

One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming. Secondly, we consider the issue of how to extract a feasible solution of the original MSSC model from the approximate solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a k-means clustering problem in <k−1, which can be solved in O(n 2(k−1)) time. In case of bi-clustering, the running time of our rounding procedure can be reduced to O(n log n). We show that our algorithm can provide a 2-approximate solution to the original problem. Promising numerical results based on our new method are reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advanced Optimization Laboratory Title: Approximating K-means-type clustering via semidefinite programming

One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectr...

متن کامل

Clustering subgaussian mixtures by semidefinite programming

We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei [PW07]. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context...

متن کامل

A new theoretical framework for K-means-type clustering

One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). The classical K-means algorithm can be interpreted as a special heuristics for the underlying 0-1 SDP. Moreover, the 0-1 SDP model...

متن کامل

Probably certifiably correct k-means clustering

Recently, Bandeira [5] introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k-means [20] is tight with high probability und...

متن کامل

Robustness of SDPs for Partial Recovery of Clustering Subgaussian Mixtures

In this paper, we examine the robustness of a relax-and-round k-means clustering procedure, a method for clustering subgaussian mixtures using semidefinite programming first introduced in [MVW16]. We are interested in the robustness of the algorithm when there is an adversarial corruption of N points each through distance at most R0. We show that under such corruption this specific algorithm we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM Journal on Optimization

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2007